3 tf.train provides a set of classes and functions that help train models.
6 The Optimizer base class provides methods to compute gradients for a loss and apply gradients to variables. A collection of subclasses implement classic optimization algorithms such as GradientDescent and Adagrad.
8 You never instantiate the Optimizer class itself, but instead instantiate one of the subclasses.
10 tf.train.Optimizer // CLASS for API and Ops to train model. Used with subclasses.
11 # Create an optimizer with the desired parameters.
12 opt = GradientDescentOptimizer(learning_rate=0.1)
13 tf.train.GradientDescentOptimizer // implements the gradient descent algorithm.
14 tf.train.AdadeltaOptimizer // Adadelta algorithm.
15 tf.train.AdagradOptimizer // Adagrad algorithm.
16 tf.train.AdagradDAOptimizer // Takes care of regularization in a minibatch from AdagradDA. Used where there is a need for large sparsity. Garuntees sparsity for linear models.
17 tf.train.MomentumOptimizer // Momuntum algorithm
18 tf.train.AdamOptimizer //https://arxiv.org/abs/1412.6980
19 tf.train.FtrlOptimizer // FTRL algorthm https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf
20 tf.train.ProximalGradientDescentOptimizer // Proximal gradient descent algorithmhttp://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf
21 tf.train.ProximalAdagradOptimizer // Proximal Adagrad algorithm.
22 tf.train.RMSPropOptimizer // http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
23 See tf.contrib.opt for more optimizers.
26 TensorFlow provides functions to compute the derivatives for a given TensorFlow computation graph, adding operations to the graph. The optimizer classes automatically compute derivatives on your graph, but creators of new Optimizers or expert users can call the lower-level functions below.
28 tf.gradients(ys,xs,grad_ys=None,name='gradients',colocate_gradients_with_ops=False,gate_gradients=False,aggregation_method=None,
29 stop_gradients=None ) // Gets derivatives between ys and xs. Returns A list of sum(dy/dx) for each x in xs.
30 tf.AggregationMethod // Computing partial derivatives to require aggregating gradient contributions.
31 tf.stop_gradient // Useful to compute a value with TF but need to pretend it were constant. EM algorithm, contrastive divergence trainnig of Botlzmann machines. Adverserial training with no backprop.
32 tf.hessians // Hessians adds to the graph to output the Hessian matrix of ys of xs.
34 TensorFlow provides several operations that you can use to add clipping functions to your graph. You can use these functions to perform general data clipping, but they're particularly useful for handling exploding or vanishing gradients.
36 tf.clip_by_value // Clips tensor values to a specified min and max.
37 tf.clip_by_norm // Clips tensor values to a maximum L2-norm.
38 tf.clip_by_average_norm // Clips tensor values to a maximum average L2-norm.
39 tf.clip_by_global_norm // Clips values of multiple tensors by the ratio of the sum of their norms.
40 tf.global_norm // global_norm = sqrt(sum([l2norm(t)**2 for t in t_list]))
41 Decaying the learning rate
42 tf.train.exponential_decay // decayed_learning_rate = learning_rate * decay_rate ^ (global_step / decay_steps)
43 tf.train.inverse_time_decay // decayed_learning_rate = learning_rate / (1 + decay_rate * global_step /decay_step)
44 tf.train.natural_exp_decay // etc.
45 tf.train.piecewise_constant // etc.
46 tf.train.polynomial_decay // etc.
47 tf.train.cosine_decay // etc.
48 tf.train.linear_cosine_decay // etc.
49 tf.train.noisy_linear_cosine_decay // etc.
51 Some training algorithms, such as GradientDescent and Momentum often benefit from maintaining a moving average of variables during optimization. Using the moving averages for evaluations often improve results significantly.
53 tf.train.ExponentialMovingAverage
54 Coordinator and QueueRunner
55 See Threading and Queues for how to use threads and queues. For documentation on the Queue API, see Queues.
57 tf.train.Coordinator // See docs
58 tf.train.QueueRunner // See docs
59 tf.train.LooperThread // See docs
60 tf.train.add_queue_runner // See docs
61 tf.train.start_queue_runners // See docs
63 See Distributed TensorFlow for more information about how to configure a distributed TensorFlow program.
65 tf.train.Server // See docs
66 tf.train.Supervisor // See docs
67 tf.train.SessionManager // See docs
68 tf.train.ClusterSpec // See docs
69 tf.train.replica_device_setter // See docs
70 tf.train.MonitoredTrainingSession // See docs
71 tf.train.MonitoredSession // See docs
72 tf.train.SingularMonitoredSession // See docs
73 tf.train.Scaffold // See docs
74 tf.train.SessionCreator // See docs
75 tf.train.ChiefSessionCreator // See docs
76 tf.train.WorkerSessionCreator // See docs
77 Reading Summaries from Event Files // See docs
78 See Summaries and TensorBoard for an overview of summaries, event files, and visualization in TensorBoard.
80 tf.train.summary_iterator
82 Hooks are tools that run in the process of training/evaluation of the model.
84 tf.train.SessionRunHook // See docs
85 tf.train.SessionRunArgs // See docs
86 tf.train.SessionRunContext // See docs
87 tf.train.SessionRunValues // See docs
88 tf.train.LoggingTensorHook // See docs
89 tf.train.StopAtStepHook // See docs
90 tf.train.CheckpointSaverHook // See docs
91 tf.train.NewCheckpointReader // See docs
92 tf.train.StepCounterHook // See docs
93 tf.train.NanLossDuringTrainingError // See docs
94 tf.train.NanTensorHook // See docs
95 tf.train.SummarySaverHook // See docs
96 tf.train.GlobalStepWaiterHook // See docs
97 tf.train.FinalOpsHook // See docs
98 tf.train.FeedFnHook // See docs
99 Training Utilities // See docs
100 tf.train.global_step // See docs
101 tf.train.basic_train_loop // See docs
102 tf.train.get_global_step // See docs
103 tf.train.assert_global_step // See docs
104 tf.train.write_graph // See docs
108 Splitting sequence inputs into minibatches with state saving
109 Use tf.contrib.training.SequenceQueueingStateSaver or its wrapper tf.contrib.training.batch_sequences_with_states if you have input data with a dynamic primary time / frame count axis which you'd like to convert into fixed size segments during minibatching, and would like to store state in the forward direction across segments of an example. //
110 * tf.contrib.training.batch_sequences_with_states(input_key,input_sequences,input_context,input_length,initial_states,num_unroll,batch_size,num_threads=3,capacity=1000,allow_small_batch=True,pad=True,make_keys_unique=False,make_keys_unique_seed=None,name=None) //
111 // Creates batches from segments of sequential input
112 * tf.contrib.training.NextQueuedSequenceBatch // CLASS stores a deffered sequenceQueuingStateSaver's data
113 * tf.contrib.training.SequenceQueueingStateSaver // CLASS is used instead of a queue to split variable length sequences into segments of sequences with fixed length. Batches into mini-batches
114 Online data resampling
115 To resample data with replacement on a per-example basis, use tf.contrib.training.rejection_sample or tf.contrib.training.resample_at_rate.
116 For rejection_sample, provide a boolean Tensor describing whether to accept or reject. Resulting batch sizes are always the same.
117 For resample_at_rate, provide the desired rate for each example. Resulting batch sizes may vary.
118 If you wish to specify relative rates, rather than absolute ones, use tf.contrib.training.weighted_resample (which also returns the actual resampling rate used for each output example). //
120 Use tf.contrib.training.stratified_sample to resample without replacement from the data to achieve a desired mix of class proportions that the Tensorflow graph sees. For instance, if you have a binary classification dataset that is 99.9% class 1, a common approach is to resample from the data so that the data is more balanced.
121 * tf.contrib.training.rejection_sample(tensors,accept_prob_fn,batch_size,queue_threads=1,enqueue_many=False,prebatch_capacity=16,prebatch_threads=1,runtime_checks=False,name=None)
122 // Creates batches by rejecting samples not accepted by a function
123 * tf.contrib.training.resample_at_rate(inputs,rates,scope=None,seed=None,back_prop=False)
125 * tf.contrib.training.stratified_sample(tensors,labels,target_probs,batch_size,init_probs=None,enqueue_many=False,queue_capacity=16,threads_per_queue=1,name=None)
126 // Resamples inputs at a rate returning a new resampled set
127 * tf.contrib.training.weighted_resample(inputs,weights,overall_rate,scope=None,mean_decay=0.999,seed=None)
128 // Creates batches based on probabilities
130 Use tf.contrib.training.bucket or tf.contrib.training.bucket_by_sequence_length to stratify minibatches into groups ("buckets").
131 Use bucket_by_sequence_length with the argument dynamic_pad=True to receive minibatches of similarly sized sequences for efficient training via dynamic_rnn.
132 * tf.contrib.training.bucket(tensors,which_bucket,batch_size,num_buckets,num_threads=1,capacity=32,bucket_capacities=None,shapes=None,dynamic_pad=False,allow_smaller_final_batch=False,keep_input=True,shared_name=None,name=None) //
133 // An aproximate weighted resampling of inputs. Choses inputs where rate of selection is proportional to weights.
134 * tf.contrib.training.bucket_by_sequence_length(input_length,tensors,batch_size,bucket_boundaries,num_threads=1,capacity=32,bucket_capacities=None,shapes=None,dynamic_pad=False,allow_smaller_final_batch=False,keep_input=True,shared_name=None,name=None) //
135 // Lazy bucketing of inputs according to their length. Calls tf.contrib.training.bucket and after subdividing bucket boundries identifies what bucks an input_length belongs to and uses that.